Let’s discuss the high-level and detailed design of a newsfeed system based on the requirements discussed in the previous lesson.

High-level design of a newsfeed system#

Primarily, the newsfeed system is responsible for the following two tasks:

  1. Feed generation: The newsfeed is generated by aggregating friends’ and followers’ posts (or feed items) based on some ranking mechanism.

  2. Feed publishing: When a feed is published, the relevant data is written into the cache and database. This data could be textual or any media content. A post containing the data from friends and followers is populated to a user’s newsfeed.

Let’s move to the high-level design of our newsfeed system. It consists of the above two essential parts, shown in the following figure:

Load balancer 
Load balancer 
Clients
Clients
Web servers
Web servers
Notification service
Notification ser...
Newsfeed generation service
Newsfeed generation se...
Newsfeed publishing service
Newsfeed publishi...
Post service
Post service
Blob storage
Blob storage
Newsfeed cache
Newsfeed cache
Postcache
Postcache
Post database
Post database
1
1
2
2
3
3
4
4
5
5
6
6
7
7
Viewer does not support full SVG 1.1
High-level design of the Newsfeed system

Let’s discuss the main components shown in the high-level design:

  1. User(s): Users can make a post with some content or request their newsfeed.

  2. Load balancer: It redirects traffic to one of the web servers.

  3. Web servers: The web servers encapsulate the back-end services and work as an intermediate layer between users and various services. Apart from enforcing authentication and rate-limiting, web servers are responsible to redirect traffic to other back-end services.

  4. Notification service: It informs the newsfeed generation service whenever a new post is available from one’s friends or followers, and sends a push notification.

  5. Newsfeed generation service: This service generates newsfeeds from the posts of followers/friends of a user and keeps them in the newsfeed cache.

  6. Newsfeed publishing service: This service is responsible for publishing newsfeeds to a users’ timeline from the newsfeed cache. It also appends a thumbnail of the media content from the blob storage and its link to the newsfeed intended for a user.

  7. Post-service: Whenever a user requests to create a post, the post-service is called, and the created post is stored on the post database and corresponding cache. The media content in the post is stored in the blob storage.

API design#

APIs are the primary ways for clients to communicate with servers. Usually, newsfeed APIs are HTTP-based that allow clients to perform actions, including posting a status, retrieving newsfeeds, adding friends, and so on. We aim to generate and get a user’s newsfeed; therefore, the following APIs are essential:

Generate user’s newsfeed#

The following API is used to generate a user’s newsfeed:

generateNewsfeed(user_id)

This API takes users’ IDs, and determines their friends and followers. This API generates newsfeeds that consist of several posts. Since internal system components use this API, therefore, it can be called offline to pre-generate newsfeeds for users. The pre-generated newsfeeds are stored on persistent storage and associated cache.

The following parameter is used in this API call:

Parameter

Description

user_id

A unique identification of the user for whom the newsfeed is generated.

Get user’s newsfeed#

The following API is used to get a user’s newsfeed:

getNewsfeed(user_id, count)

The getNewsfeed(.) API call returns a JSON object consisting of a list of posts.

The following parameters are used for this API:

Parameter

Description

user_id

A unique identification of the user for whom the system will fetch the newsfeed.

count

The number of feed items (posts) that will be retrieved per request.

Storage schema#

The database relations for the newsfeed system are as follows:

  • User: This relation contains data about a user. A user can also be a follower or friend of other users.

  • Entity: This relation stores data related to any entity, such as pages, groups, and so on.

  • Feed_item: The data about posts created by users is stored in this relation.

  • Media: The information about the media content is stored in this relation.

User
User
User_ID:            varchar(32) (PK)
User_ID:            varchar(32) (PK)
Name:                varchar(32)
Name:                varchar(32)
Email:                 varchar (32)
Email:                 varchar (32)
CreationDate:    datetime
CreationDate:    datetime
Mobile:               varchar (32)
Mobile:               varchar (32)
LastLogin:          datetime
LastLogin:          datetime
Entity (Page or Group)
Entity (Page or Group)
Entity_ID:             varchar (PK)
Entity_ID:             varchar (PK)
Name:                   varchar(32)
Name:                   varchar(32)
Description:           varchar(512)
Description:           varchar(512)
CreationDate:       datetime
CreationDate:       datetime
Creator:                User_ID (FK)
Creator:                User_ID (FK)
Feed_Item
Feed_Item
Feed_Item_ID:     varchar (PK)
Feed_Item_ID:     varchar (PK)
Creator:                 User_ID(FK)
Creator:                 User_ID(FK)
Content:                varchar(512)
Content:                varchar(512)
Entity_ID:              Entity_ID (FK)
Entity_ID:              Entity_ID (F...
CreationDate:        datetime
CreationDate:        datetime
Likes_count:          int
Likes_count:          int
Media_ID:              int
Media_ID:              int
Media
Media
Media_ID:           int (PK)
Media_ID:           int (PK)
Desciption:          varchar (256)
Desciption:          varchar (256)
Path:                   varchar(256)
Path:                   varchar(256)
Views_count:      int
Views_count:      int
CreationDate:     datetime
CreationDate:     datetime
Viewer does not support full SVG 1.1
The database schema for the newsfeed system

We use a graph database to store relationships between users, friends, and followers. For this purpose, we follow the property graph model. We can think of a graph database consisting of two relational tables:

  1. For vertices that represent users
  2. For edges that denotes relationships among them

Therefore, we follow a relational schema for the graph store, as shown in the following figure. The schema uses the PostgreSQL JSON datatype to store the properties of each vertex (user) or edge (relationship).

An alternative representation of a User can be shown in the graph database below. Where the Users_ID remains the same and attributes are stored in a JSON file format.

User
User
Users_ID: int (PK)
Users_ID: int (PK)
properties : JSON
properties : JSON
Relationship
Relationship
Relation_ID: int (PK)
Relation_ID: int (PK)
Relation_from: int  REFERENCES User (User_ID)
Relation_from: int  REFERENCES User...
Relation_towards: int REFERENCES User (Users_ID)
Relation_towards: int REFERENCES Use...
Label: varchar (32)
Label: varchar (32)
properties: JSON
properties: JSON
Viewer does not support full SVG 1.1
Schema of the graph database to hold relationships between users

The following figure demonstrates how graph can be represented using the relational schema:

782 (User's ID)
782 (User's ID)
399 (User's ID)
399 (User's ID)
Relation_ID: 111



Label: Following
Relation_ID: 111...
For the above simple graph,  Users_ID 782 and 399 the Relationship table will have the following entries.
For the above simple graph,  Users_ID 782 and 399 the Relationship table will hav...
User_ID: 782
User_ID: 782
User_ID: 399
User_ID: 399
Relation_ID
Relation_ID
Relation_from
Relation_from
Relation_towards
Relation_towards
Label
Label
Properties
Properties
111
111
782
782
399
399
Following
Following
Some description as a json datatype
Some descriptio...
Relationship
Relationship
Viewer does not support full SVG 1.1
A graph between two users consisting of two vertices and an edge

Detailed design#

Let’s explore the design of the newsfeed system in detail.

As discussed earlier, there are two parts of the newsfeed system; newsfeed publishing and newsfeed generation. Therefore, we’ll discuss both parts, starting with the newsfeed generation service.

The newsfeed generation service#

Newsfeed is generated by aggregated posts (or feed items) from the user’s friends, followers, and other entities (pages and groups).

In our proposed design, the newsfeed generation service is responsible for generating the newsfeed. When a request from a user (say Alice) to retrieve a newsfeed is received at web servers, the web server either:

  • Calls the newsfeed generation service to generate feeds because some users don’t often visit the platform, so their feeds are generated on their request.
  • It fetches the pre-generated newsfeed for active users who visit the platform frequently.

The following steps are performed in sequence to generate a newsfeed for Alice:

  1. The newsfeed generation service retrieves IDs of all users and entities that Alice follows from the graph database.
  2. When the IDs are retrieved from the graph database, the next step is to get their friends’ (followers and entities) information from the user cache, which is regularly updated whenever the users database gets updated/modified.
  3. In this step, the service retrieves the latest, most popular, and relevant posts for those IDs from the post cache. These are the posts that we might be able to display on Alice’s newsfeed.
  4. The ranking service ranks posts based on their relevance to Alice. This represents Alice’s current newsfeed.
  5. The newsfeed is stored in the the newsfeed cache from which top NN posts are published to Alice’s timeline. (The publishing process is discussed in detail in the following section).
  6. In the end, whenever Alice reaches the end of her timeline, the next top NN posts are fetched to her screen from the newsfeed cache.

The process is illustrated in the following figure:

Load balancer 
Load balancer 
Clients
Clients
Web servers
Web servers
Newsfeed generation service
Newsfeed generat...
User cache
User cache
Graph database
Graph database
User database
User database
Post cache
Post cache
Post database
Post database
Newsfeed cache servers holding constructed feeds and ranked feeds
Newsfeed cache servers h...
Ranking service
Ranking service
1
1
2
2
3
3
4
4
CDN
CDN
5
5
Viewer does not support full SVG 1.1
Working of the newsfeed generation service

Point to Ponder

Question

The creation and storage of newsfeeds for each user in the cache requires an enormous amount of memory (step 5 in the above section). Is there any way to reduce this memory consumption?

Hide Answer

A memory-efficient way would be to store just the mapping between users and their corresponding posts in a table in the cache, that is., < Post_ID, User_ID>. During the feed publishing phase, the system will retrieve posts from the post database and generate the newsfeed for a user who follows another user with User_ID.

The newsfeed publishing service#

At this stage, the newsfeeds are generated for users from their respective friends, followers, and entities and are stored in the form of <Post_ID, User_ID> in the news feed cache.

Now the question is how the newsfeeds generated for Alice will be published to her timeline?

The newsfeed publishing service fetches a list of post IDs from the newsfeed cache. The data fetched from the newsfeed cache is a tuple of post and user IDs, that is., <Post_ID, User_ID>. Therefore, the complete data about posts and users are retrieved from the users and posts cache to create an entirely constructed newsfeed.

In the last step, the fully constructed newsfeed is sent to the client (Alice) using one of the fan-out approaches. The popular newsfeed and media content are also stored in CDN for fast retrieval.

Suppose if our service receives billions of requests at once and the system starts generating live newsfeeds. This situation would put an enormous number of users on hold and could crash servers. To avoid this, we can assign dedicated servers that will continuously rank and generate newsfeeds and store them in the newsfeed database and memory. Therefore, upon a request for new posts from a user, our system will provide the pre-generated feeds.

Load balancer 
Load balancer 
Clients
Clients
Web servers
Web servers
Newsfeed publishing service
Newsfeed publishin...
User cache
User cache
Newsfeed cache
Newsfeed cache
User database
User database
Blob storage
Blob storage
Post database
Post database
Post cache
Post cache
CDN
CDN
Viewer does not support full SVG 1.1
The newsfeed publishing service in action

Point to Ponder

Question

How is the newsfeed of the friends and followers updated when a user creates a new post?

Show Answer

The newsfeed ranking service#

Often we see the relevant and important posts on the top of our newsfeed whenever we log in to our social media accounts. This ranking involves multiple advanced ranking and recommendation algorithms.

In our design, the newsfeed ranking service consists of these algorithms working on various features, such as, a user’s past history, likes, dislikes, comments, clicks, and many more. These algorithms also perform the following functions:

  • Select “candidates” posts to show in a newsfeed.
  • Eliminate posts including misinformation or clickbait from the candidate posts.
  • Create a list of friends a user frequently interacts with.
  • Choose topics on which a user spent more time.

The ranking system considers all the above points to predict relevant and important posts for a user.

Load balancer 
Load balancer 
Clients
Clients
Web servers
Web servers
Newsfeed publishing service
Newsfeed publishin...
User cache
User cache
User database
User database
Blob storage
Blob storage
Post database
Post database
Cache servers holding constructed feeds and ranked feeds
Cache servers holding...
Ranking service
Ranking service
Post cache
Post cache
CDN
CDN
Viewer does not support full SVG 1.1
The ranked newsfeeds are stored in the cache servers

Posts ranking and newsfeed construction#

The post database contains posts published by different users. Assume that there are 10 posts in the database published by 5 different users. We aim to rank only 4 posts out of 10 for a user (say Bob) who follows those five different users. We perform the following to rank each post and create a newsfeed for Bob:

  1. Various features such as likes, comments, shares, category, duration, etc and so on, are extracted from each post.
  2. Based on Bob’s previous history, stored in the user database, the relevance is calculated for each post via different ranking and machine learning algorithms.
  3. A relevance score is assigned, say from 1 to 5, where 1 shows the least relevant post and 5 means highly relevant post.
  4. The top 4 posts are selected out of 10 based on the assigned scores.
  5. The top 4 posts are combined and presented on Bob’s timeline in decreasing order of the score assigned.

The following figure shows the top 4 posts published on Bob’s timeline:

Relevance score = 5
Relevance score = 5
Relevance score = 4
Relevance score = 4
Relevance score = 3
Relevance score = 3
Relevance score = 2
Relevance score = 2
Viewer does not support full SVG 1.1
A newsfeed consisting of top 4 posts based on the relevance scores

Newsfeed ranking with various machine learning and ranking algorithms is a computationally intensive task. In our design, the ranking service ranks posts and constructs newsfeeds. This service consists of big-data processing systems that might utilize specialized hardware like graphics processing units (GPUs) and tensor processing units (TPUs).

Note: According to Facebook:

“For each person on Facebook, we need to evaluate thousands of features to determine what that person might find most relevant to predict what each of those people wants to see in their feed.”

This implies that we need enormous computational power and sophisticated learning algorithms to incorporate all the features to get good quality feed in a reasonably short time.

Putting everything together#

The following figure combines all the services related to the detailed design of the newsfeed system:

Load balancer 
Load balancer 
Clients
Clients
Web servers
Web servers
Newsfeed service
Newsfeed service
Notification service
Notification service
Newsfeed publishing service
Newsfeed publishi...
Post service
Post service
Blob storage
Blob storage
User cache
User cache
Post cache
Post cache
Graph database
Graph database
User database
User database
Post database
Post database
Newsfeed cache servers holding constructed feeds and ranked feeds
Newsfeed cache servers h...
Ranking service
Ranking service
CDN
CDN
Newsfeed database
Newsfeed database
Viewer does not support full SVG 1.1
The detailed design of newsfeed system

In this lesson, we discussed the design of the newsfeed system, its database schema, and the newsfeed ranking system. In the next lesson, we’ll evaluate our system’s requirements.

Requirements of a Newsfeed System’s Design
Evaluation of a Newsfeed System’s Design
Mark as Completed